🤖 AI Summary
Remote sensing super-resolution (SR) research has long been hindered by the high cost and limited availability of globally consistent, high-resolution satellite imagery.
Method: We introduce the first open-source, large-scale paired satellite dataset covering the full spectrum of Earth surface types—including underrepresented humanitarian hotspots and illicit mining zones—comprising ~10,000 km² of co-registered SPOT 6/7 (1.5 m) and Sentinel-2 (10 m) time-series imagery. We propose a novel global land-cover stratified sampling strategy with targeted augmentation for vulnerable regions, an open, extensible data construction framework integrated with EO-learn for automated multi-source co-registration and multi-frame SR modeling, and a lightweight, efficient SR baseline model with end-to-end training/inference tooling.
Results: Experiments demonstrate that models trained on this dataset significantly enhance Sentinel-2 analysis performance—approaching the fidelity of commercial high-resolution imagery—and effectively bridge the capability gap between public low-resolution and proprietary high-resolution remote sensing analytics.
📝 Abstract
Analyzing the planet at scale with satellite imagery and machine learning is a dream that has been constantly hindered by the cost of difficult-to-access highly-representative high-resolution imagery. To remediate this, we introduce here the WorldStrat dataset. The largest and most varied such publicly available dataset, at Airbus SPOT 6/7 satellites' high resolution of up to 1.5 m/pixel, empowered by European Space Agency's Phi-Lab as part of the ESA-funded QueryPlanet project, we curate nearly 10,000 sqkm of unique locations to ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities. We also enrich those with locations typically under-represented in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk. We temporally-match each high-resolution image with multiple low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites at 10 m/pixel. We accompany this dataset with an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox. We hereby hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop from free public low-resolution Sentinel2 imagery the same power of analysis allowed by costly private high-resolution imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution. High-resolution Airbus imagery is CC BY-NC, while the labels and Sentinel2 imagery are CC BY, and the source code and pre-trained models under BSD. The dataset is available at https://zenodo.org/record/6810791 and the software package at https://github.com/worldstrat/worldstrat .