🤖 AI Summary
Remote sensing pre-trained models are commonly constrained by fixed input modalities and spatial scales, limiting their adaptability to heterogeneous multi-sensor data and multi-scale Earth surface phenomena. To address this, we propose the Galileo model family: (1) a modality-agnostic encoder architecture supporting variable-resolution inputs; (2) a novel self-supervised learning paradigm that jointly models large-scale global structures and fine-grained local details for the first time; and (3) a unified spatiotemporal representation learning framework enabling cross-sensor and cross-resolution generalization. Evaluated on crop mapping and flood detection, Galileo achieves state-of-the-art performance while significantly reducing reliance on labeled data. Our work establishes a new paradigm for general-purpose remote sensing foundation models.
📝 Abstract
From crop mapping to flood detection, machine learning in remote sensing has a wide range of societally beneficial applications. The commonalities between remote sensing data in these applications present an opportunity for pretrained machine learning models tailored to remote sensing to reduce the labeled data and effort required to solve individual tasks. However, such models must be: (i) flexible enough to ingest input data of varying sensor modalities and shapes (i.e., of varying spatial and temporal dimensions), and (ii) able to model Earth surface phenomena of varying scales and types. To solve this gap, we present Galileo, a family of pretrained remote sensing models designed to flexibly process multimodal remote sensing data. We also introduce a novel and highly effective self-supervised learning approach to learn both large- and small-scale features, a challenge not addressed by previous models. Our Galileo models obtain state-of-the-art results across diverse remote sensing tasks.