HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

📅 2024-06-17

🏛️ arXiv.org

📈 Citations: 14

✨ Influential: 1

career value

184K/year

🤖 AI Summary

Existing hyperspectral image (HSI) methods suffer from strong task- and scene-specific coupling, resulting in poor generalization and limited cross-domain transferability. To address this, we propose the first universal foundation model for HSI, introducing three key innovations: (1) a Sparse Sampling Attention (SSA) mechanism to efficiently capture long-range spatial-spectral dependencies; (2) a spectral enhancement module that deeply fuses spatial and spectral features; and (3) HyperGlobal-450K—the largest publicly available HSI pretraining dataset to date—enabling large-scale self-supervised pretraining and multi-task fine-tuning. Built upon a Vision Transformer architecture with over 1 billion parameters, our model achieves state-of-the-art performance across both low-level (e.g., denoising, super-resolution) and high-level (e.g., classification, segmentation) HSI tasks. It further demonstrates superior robustness, cross-modal transfer capability, and practical deployment efficiency.

Technology Category

Application Category

📝 Abstract

Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes, thereby reducing the practicality in real-world applications. To address these challenges, we present HyperSIGMA, a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes, scalable to over one billion parameters. To overcome the spectral and spatial redundancy inherent in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, real-world applicability, and computational efficiency. The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA.

Problem

Research questions and friction points this paper is trying to address.

Develops a foundation model for versatile hyperspectral image interpretation

Addresses spectral and spatial redundancy with sparse sampling attention

Creates large-scale dataset to enhance hyperspectral image pre-training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision transformer-based foundation model for HSI

Sparse sampling attention mechanism reduces redundancy

Large-scale dataset HyperGlobal-450K for pre-training

🔎 Similar Papers

SpectralEarth: Training Hyperspectral Foundation Models at Scale