VP-Hype: A Hybrid Mamba-Transformer Framework with Visual-Textual Prompting for Hyperspectral Image Classification

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of hyperspectral image classification under extremely limited labeled samples and high-dimensional spectral data, compounded by the prohibitive computational complexity of conventional Transformers. To this end, the authors propose a hybrid architecture that integrates State Space Models (Mamba) with Transformers, augmented by a 3D-CNN spectral front-end and a vision–text dual-modality prompting mechanism to effectively capture long-range dependencies in low-data regimes. Notably, this is the first study to introduce Mamba into hyperspectral classification, leveraging parameter-efficient fine-tuning to mitigate label scarcity. Evaluated with only 2% of training samples, the method achieves overall accuracies of 99.69% on Salinas and 99.45% on Longkou, significantly outperforming existing approaches and establishing a new state of the art for data-scarce hyperspectral classification.

Technology Category

Application Category

📝 Abstract
Accurate classification of hyperspectral imagery (HSI) is often frustrated by the tension between high-dimensional spectral data and the extreme scarcity of labeled training samples. While hierarchical models like LoLA-SpecViT have demonstrated the power of local windowed attention and parameter-efficient fine-tuning, the quadratic complexity of standard Transformers remains a barrier to scaling. We introduce VP-Hype, a framework that rethinks HSI classification by unifying the linear-time efficiency of State-Space Models (SSMs) with the relational modeling of Transformers in a novel hybrid architecture. Building on a robust 3D-CNN spectral front-end, VP-Hype replaces conventional attention blocks with a Hybrid Mamba-Transformer backbone to capture long-range dependencies with significantly reduced computational overhead. Furthermore, we address the label-scarcity problem by integrating dual-modal Visual and Textual Prompts that provide context-aware guidance for the feature extraction process. Our experimental evaluation demonstrates that VP-Hype establishes a new state of the art in low-data regimes. Specifically, with a training sample distribution of only 2\%, the model achieves Overall Accuracy (OA) of 99.69\% on the Salinas dataset and 99.45\% on the Longkou dataset. These results suggest that the convergence of hybrid sequence modeling and multi-modal prompting provides a robust path forward for high-performance, sample-efficient remote sensing.
Problem

Research questions and friction points this paper is trying to address.

hyperspectral image classification
label scarcity
high-dimensional spectral data
low-data regime
remote sensing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Mamba-Transformer
Visual-Textual Prompting
State-Space Models
Hyperspectral Image Classification
Low-Data Regime
🔎 Similar Papers
No similar papers found.
A
Abdellah Zakaria Sellam
Institute of Applied Sciences and Intelligent Systems (ISASI), CNR, 73100 Lecce, Italy; Dept. of Engineering for Innovation, University of Salento, 73100 Lecce, Italy
Fadi Abdeladhim Zidi
Fadi Abdeladhim Zidi
université de biskra
electronique des systèmes embarquésintelligence artificielleagriculture
Salah Eddine Bekhouche
Salah Eddine Bekhouche
University of the Basque Country
Face Analysis
I
Ihssen Houhou
VSC Laboratory, Department of Electronics and Automation, University of Biskra, Algeria
M
Marouane Tliba
Institut Galilée, Université Sorbonne Paris Nord, F-93430, Villetaneuse, France
Cosimo Distante
Cosimo Distante
CNR and University of Salento
Deep learningpattern recognitioncomputer visionrobotics
Abdenour Hadid
Abdenour Hadid
Professor, Sorbonne Center for Artificial Intelligence (SCAI)
Artificial IntelligenceComputer VisionLLMsHealthcareAutonomous Driving