EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

📅 2025-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale self-supervised learning for remote sensing image understanding is hindered by the scarcity of high-quality, multi-source data and domain-specific models. Method: This paper introduces EarthView, a global self-supervised pretraining dataset spanning 2017–2022 with 15 terapixels, integrating NEON, Sentinel, and Satellogic (1 m) imagery—standardized into HuggingFace Parquet format—and establishes the first multimodal remote sensing self-supervised benchmark. We propose EarthMAE, a novel masked autoencoder supporting joint modeling of hyperspectral, multispectral, topographic, semantic segmentation, and time-series modalities. Cross-source registration and spatiotemporal alignment ensure data consistency. Contribution/Results: Experiments demonstrate that pretraining solely on the Satellogic subset yields significant performance gains across diverse downstream tasks, validating both the efficacy of heterogeneous remote sensing data in self-supervised learning and its strong transferability.

Technology Category

Application Category

📝 Abstract
This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.
Problem

Research questions and friction points this paper is trying to address.

Earth Observation Dataset
Machine Learning
Temporal Change Analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

EarthView
Self-supervised Learning
Hyper-scale Earth Observation Dataset
🔎 Similar Papers
No similar papers found.
D
Diego Velazquez
Computer Vision Center, ServiceNow Research
P
Pau Rodr'iguez L'opez
Computer Vision Center, Apple Research
Sergio Alonso
Sergio Alonso
Satellogic
J
J. M. Gonfaus
Satellogic
Jordi Gonzàlez
Jordi Gonzàlez
Universitat Autònoma de Barcelona, Computer Vision Center
Computer VisionMachine LearningArtificial Intelligence
G
Gerardo Richarte
Satellogic
J
Javier Marin
Satellogic
Y
Y. Bengio
Mila, Université de Montréal
Alexandre Lacoste
Alexandre Lacoste
Staff Research Scientist, ServiceNow Research
machine learning