SpectralEarth: Training Hyperspectral Foundation Models at Scale

📅 2024-08-15

🏛️ IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

📈 Citations: 18

✨ Influential: 1

career value

219K/year

🤖 AI Summary

To address the critical bottleneck of lacking globally representative, multi-temporal, large-scale benchmark datasets for hyperspectral imagery—hindering foundational model development—this work introduces SpectralEarth, the first large-scale multi-temporal hyperspectral pretraining dataset (415K locations, 538K image patches). We present the first scalable self-supervised pretraining framework for hyperspectral foundation models, leveraging MAE and SimCLR, and propose a novel spectral adapter architecture that embeds spectral-specific inductive biases into vision backbones. Furthermore, we establish the first unified benchmark for hyperspectral downstream tasks, comprising nine diverse datasets. Our method enables joint spectral-spatial modeling and cross-sensor generalization via fine-tuning. Experiments demonstrate that the pretrained models significantly outperform supervised baselines on land-cover, crop, and tree-species classification; achieve 40% higher fine-tuning efficiency; and exhibit strong robustness to cross-sensor domain shifts.

Technology Category

Application Category

📝 Abstract

Foundation models have triggered a paradigm shift in computer vision and are increasingly being adopted in remote sensing, particularly for multispectral imagery. Yet, their potential in hyperspectral imaging (HSI) remains untapped due to the absence of comprehensive and globally representative hyperspectral datasets. To close this gap, we introduce SpectralEarth, a large-scale multitemporal dataset designed to pretrain hyperspectral foundation models leveraging data from the environmental mapping and analysis program (EnMAP). SpectralEarth comprises 538 974 image patches covering 415 153 unique locations from 11 636 globally distributed EnMAP scenes spanning two years of archive. In addition, 17.5% of these locations include multiple timestamps, enabling multitemporal HSI analysis. Utilizing state-of-the-art self-supervised learning algorithms, we pretrain a series of foundation models on SpectralEarth, integrating a spectral adapter into classical vision backbones to accommodate the unique characteristics of HSI. In tandem, we construct nine downstream datasets for land-cover, crop-type mapping, and tree-species classification, providing benchmarks for model evaluation. Experimental results support the versatility of our models and their generalizability across different tasks and sensors. We also highlight computational efficiency during model fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive hyperspectral datasets for foundation models

Need for multitemporal hyperspectral imaging analysis capabilities

Challenges in adapting vision backbones for hyperspectral data characteristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale hyperspectral dataset SpectralEarth introduced

Self-supervised learning for hyperspectral foundation models

Spectral adapter integrated into vision backbones

🔎 Similar Papers

FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring