Hyperspectral Image Land Cover Captioning Dataset for Vision Language Models

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing hyperspectral imaging (HSI) datasets are limited to coarse-grained classification and lack pixel-level semantic understanding. To address this, we introduce HyperCap—the first large-scale hyperspectral scene captioning dataset—enabling fine-grained, pixel-level textual descriptions of spectral imagery. We propose a hybrid annotation paradigm combining automated pre-screening with expert verification, covering four major HSI benchmarks and bridging the critical gap in large-scale vision-language alignment for remote sensing. Leveraging multi-source encoders—ViT for spatial features and SpectralFormer for spectral features—we systematically evaluate early/late fusion and cross-attention mechanisms. Experiments demonstrate that text-guided supervision significantly improves classification accuracy (average +5.2%), establishing a new benchmark and methodological foundation for hyperspectral vision-language modeling.

Technology Category

Application Category

📝 Abstract

We introduce HyperCap, the first large-scale hyperspectral captioning dataset designed to enhance model performance and effectiveness in remote sensing applications. Unlike traditional hyperspectral imaging (HSI) datasets that focus solely on classification tasks, HyperCap integrates spectral data with pixel-wise textual annotations, enabling deeper semantic understanding of hyperspectral imagery. This dataset enhances model performance in tasks like classification and feature extraction, providing a valuable resource for advanced remote sensing applications. HyperCap is constructed from four benchmark datasets and annotated through a hybrid approach combining automated and manual methods to ensure accuracy and consistency. Empirical evaluations using state-of-the-art encoders and diverse fusion techniques demonstrate significant improvements in classification performance. These results underscore the potential of vision-language learning in HSI and position HyperCap as a foundational dataset for future research in the field.

Problem

Research questions and friction points this paper is trying to address.

Enhancing model performance in hyperspectral image captioning

Integrating spectral data with textual annotations for semantic understanding

Improving classification and feature extraction in remote sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates spectral data with textual annotations

Uses hybrid automated and manual annotation

Applies vision-language learning to hyperspectral imagery

🔎 Similar Papers

RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models