HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

๐Ÿ“… 2024-06-17
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 14
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing hyperspectral image (HSI) methods suffer from strong task- and scene-specific coupling, resulting in poor generalization and limited cross-domain transferability. To address this, we propose the first universal foundation model for HSI, introducing three key innovations: (1) a Sparse Sampling Attention (SSA) mechanism to efficiently capture long-range spatial-spectral dependencies; (2) a spectral enhancement module that deeply fuses spatial and spectral features; and (3) HyperGlobal-450Kโ€”the largest publicly available HSI pretraining dataset to dateโ€”enabling large-scale self-supervised pretraining and multi-task fine-tuning. Built upon a Vision Transformer architecture with over 1 billion parameters, our model achieves state-of-the-art performance across both low-level (e.g., denoising, super-resolution) and high-level (e.g., classification, segmentation) HSI tasks. It further demonstrates superior robustness, cross-modal transfer capability, and practical deployment efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes, thereby reducing the practicality in real-world applications. To address these challenges, we present HyperSIGMA, a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes, scalable to over one billion parameters. To overcome the spectral and spatial redundancy inherent in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, real-world applicability, and computational efficiency. The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA.
Problem

Research questions and friction points this paper is trying to address.

Develops a foundation model for versatile hyperspectral image interpretation
Addresses spectral and spatial redundancy with sparse sampling attention
Creates large-scale dataset to enhance hyperspectral image pre-training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision transformer-based foundation model for HSI
Sparse sampling attention mechanism reduces redundancy
Large-scale dataset HyperGlobal-450K for pre-training
๐Ÿ”Ž Similar Papers
No similar papers found.
D
Di Wang
Wuhan University, China
M
Meiqi Hu
Wuhan University, China
Y
Yao Jin
Wuhan University, China
Yuchun Miao
Yuchun Miao
School of Computer Science, Wuhan University
Image ProcessingRemote SensingLarge Language ModelRLHFMachine Learning
J
Jiaqi Yang
Wuhan University, China
Yichu Xu
Yichu Xu
Wuhan University
Remote SensingComputer VisionDeep LearningAI4EOHyperspectral
Xiaolei Qin
Xiaolei Qin
Wuhan University
Remote sensing
J
Jiaqi Ma
Wuhan University, China
L
Lingyu Sun
Wuhan University, China
C
Chenxing Li
Wuhan University, China
C
Chuan Fu
Chongqing University, China
Hongruixuan Chen
Hongruixuan Chen
The University of Tokyo, RIKEN
Deep LearningComputer VisionGeoAIAI4EOMultimodal Remote Sensing
C
Chengxi Han
Wuhan University, China
Naoto Yokoya
Naoto Yokoya
The University of Tokyo, RIKEN
Remote SensingComputer VisionMachine LearningData Fusion
J
Jing Zhang
M
Minqiang Xu
National Engineering Research Center of Speech and Language Information Processing, China
L
Lin Liu
National Engineering Research Center of Speech and Language Information Processing, China
Lefei Zhang
Lefei Zhang
School of Computer Science, Wuhan University
Pattern RecognitionMachine LearningImage ProcessingRemote Sensing
C
Chen Wu
Wuhan University, China
Bo Du
Bo Du
Department of Management, Griffith Business School
Sustainable TransportTravel BehaviourUrban Data AnalyticsLogistics and Supply Chain
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining
L
Liangpei Zhang
Wuhan University, China