🤖 AI Summary
Existing Earth observation (EO) foundation models are constrained by reliance on single-sensor data, surface-level perspectives, and neglect of metadata, limiting their capacity to learn generalizable spatial representations and hindering scalability and multimodal adaptability. To address this, we propose the first unified EO foundation model for integrated surface–atmosphere remote sensing. Our approach constructs a large-scale, cross-Sentinel mission-aligned pretraining dataset comprising 18.7 million samples; introduces a scalable dynamic hypernetwork architecture coupled with a metadata-aware encoding mechanism; and establishes Copernicus-Bench—the first hierarchical evaluation benchmark spanning 15 diverse EO tasks. By fusing multisource spectral and non-spectral data, performing cross-modal alignment during pretraining, and jointly modeling metadata, our model achieves substantial improvements in scale adaptability, modality generalization, and task transferability—effectively bridging data and model gaps across EO, meteorological, and climate science domains.
📝 Abstract
Advances in Earth observation (EO) foundation models have unlocked the potential of big satellite data to learn generic representations from space, benefiting a wide range of downstream applications crucial to our planet. However, most existing efforts remain limited to fixed spectral sensors, focus solely on the Earth's surface, and overlook valuable metadata beyond imagery. In this work, we take a step towards next-generation EO foundation models with three key components: 1) Copernicus-Pretrain, a massive-scale pretraining dataset that integrates 18.7M aligned images from all major Copernicus Sentinel missions, spanning from the Earth's surface to its atmosphere; 2) Copernicus-FM, a unified foundation model capable of processing any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding; and 3) Copernicus-Bench, a systematic evaluation benchmark with 15 hierarchical downstream tasks ranging from preprocessing to specialized applications for each Sentinel mission. Our dataset, model, and benchmark greatly improve the scalability, versatility, and multimodal adaptability of EO foundation models, while also creating new opportunities to connect EO, weather, and climate research. Codes, datasets and models are available at https://github.com/zhu-xlab/Copernicus-FM.