PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models

📅 2024-12-05
🏛️ arXiv.org
📈 Citations: 9
Influential: 1
📄 PDF
🤖 AI Summary
Existing geospatial foundation models (GFMs) suffer from geographic bias (e.g., overrepresentation of North America and Europe), inconsistent evaluation protocols, and narrow task coverage—hindering rigorous assessment of their global generalization capability. To address this, we introduce PANGAEA, the first global, multimodal, and geographically balanced benchmark for GFMs. It encompasses multiscale, multi-sensor (e.g., Sentinel, Landsat, NAIP), and multi-temporal remote sensing data, supporting diverse downstream tasks including classification, semantic segmentation, and object detection. We propose a geographically inclusive evaluation framework enabling zero-shot and few-shot transfer, as well as dynamic benchmark expansion. Comprehensive evaluation across 30+ datasets reveals that GFM performance varies significantly across geographic regions and tasks, and under low-label regimes, GFMs do not consistently outperform supervised baselines. We publicly release all code and benchmark resources to foster reproducible, fair, and comparable GFM evaluation.

Technology Category

Application Category

📝 Abstract
Geospatial Foundation Models (GFMs) have emerged as powerful tools for extracting representations from Earth observation data, but their evaluation remains inconsistent and narrow. Existing works often evaluate on suboptimal downstream datasets and tasks, that are often too easy or too narrow, limiting the usefulness of the evaluations to assess the real-world applicability of GFMs. Additionally, there is a distinct lack of diversity in current evaluation protocols, which fail to account for the multiplicity of image resolutions, sensor types, and temporalities, which further complicates the assessment of GFM performance. In particular, most existing benchmarks are geographically biased towards North America and Europe, questioning the global applicability of GFMs. To overcome these challenges, we introduce PANGAEA, a standardized evaluation protocol that covers a diverse set of datasets, tasks, resolutions, sensor modalities, and temporalities. It establishes a robust and widely applicable benchmark for GFMs. We evaluate the most popular GFMs openly available on this benchmark and analyze their performance across several domains. In particular, we compare these models to supervised baselines (e.g. UNet and vanilla ViT), and assess their effectiveness when faced with limited labeled data. Our findings highlight the limitations of GFMs, under different scenarios, showing that they do not consistently outperform supervised models. PANGAEA is designed to be highly extensible, allowing for the seamless inclusion of new datasets, models, and tasks in future research. By releasing the evaluation code and benchmark, we aim to enable other researchers to replicate our experiments and build upon our work, fostering a more principled evaluation protocol for large pre-trained geospatial models. The code is available at https://github.com/VMarsocci/pangaea-bench.
Problem

Research questions and friction points this paper is trying to address.

Inconsistent and narrow evaluation of Geospatial Foundation Models (GFMs)
Lack of diversity in evaluation protocols for GFMs
Geographic bias in existing benchmarks limits global applicability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized evaluation protocol for diverse geospatial data
Compares GFMs with supervised models like UNet and ViT
Extensible benchmark for new datasets and tasks
🔎 Similar Papers
No similar papers found.