๐ค AI Summary
Existing hyperspectral image (HSI) methods suffer from strong task- and scene-specific coupling, resulting in poor generalization and limited cross-domain transferability. To address this, we propose the first universal foundation model for HSI, introducing three key innovations: (1) a Sparse Sampling Attention (SSA) mechanism to efficiently capture long-range spatial-spectral dependencies; (2) a spectral enhancement module that deeply fuses spatial and spectral features; and (3) HyperGlobal-450Kโthe largest publicly available HSI pretraining dataset to dateโenabling large-scale self-supervised pretraining and multi-task fine-tuning. Built upon a Vision Transformer architecture with over 1 billion parameters, our model achieves state-of-the-art performance across both low-level (e.g., denoising, super-resolution) and high-level (e.g., classification, segmentation) HSI tasks. It further demonstrates superior robustness, cross-modal transfer capability, and practical deployment efficiency.
๐ Abstract
Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes, thereby reducing the practicality in real-world applications. To address these challenges, we present HyperSIGMA, a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes, scalable to over one billion parameters. To overcome the spectral and spatial redundancy inherent in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, real-world applicability, and computational efficiency. The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA.