OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges posed by the spatial, temporal, and multi-source heterogeneous characteristics of remote sensing data, this paper introduces the first multimodal spatiotemporal foundation model tailored for Earth observation. Methodologically, we propose a remote sensing–specific masking strategy, a spatiotemporal-aware loss function, and a self-supervised learning paradigm; integrate stable latent image modeling with an end-to-end multimodal fusion architecture; and enable seamless integration across data acquisition, annotation, training, and inference. Evaluated on 24 embedding benchmark tasks, our model achieves state-of-the-art (SOTA) performance on 15; on 29 downstream fine-tuning tasks, it ranks first on 19. Overall, it ranks among the top performers across 12 leading foundation models. To foster reproducibility and real-world impact, we fully open-source the code, pre-trained weights, and benchmark datasets—supporting global environmental monitoring and sustainable development applications.

Technology Category

Application Category

📝 Abstract
Earth observation data presents a unique challenge: it is spatial like images, sequential like video or text, and highly multimodal. We present OlmoEarth: a multimodal, spatio-temporal foundation model that employs a novel self-supervised learning formulation, masking strategy, and loss all designed for the Earth observation domain. OlmoEarth achieves state-of-the-art performance compared to 12 other foundation models across a variety of research benchmarks and real-world tasks from external partners. When evaluating embeddings OlmoEarth achieves the best performance on 15 out of 24 tasks, and with full fine-tuning it is the best on 19 of 29 tasks. We deploy OlmoEarth as the backbone of an end-to-end platform for data collection, labeling, training, and inference of Earth observation models. The OlmoEarth Platform puts frontier foundation models and powerful data management tools into the hands of non-profits and NGOs working to solve the world's biggest problems. OlmoEarth source code, training data, and pre-trained weights are available at $href{https://github.com/allenai/olmoearth_pretrain}{ ext{https://github.com/allenai/olmoearth_pretrain}}$.
Problem

Research questions and friction points this paper is trying to address.

Modeling multimodal Earth observation data with spatial and sequential characteristics
Developing self-supervised learning methods for Earth observation domain challenges
Creating a foundation model platform for real-world environmental applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning for Earth observation data
Novel masking strategy and loss formulation
End-to-end platform for data and model management
🔎 Similar Papers
No similar papers found.
H
Henry Herzog
Allen Institute for AI
Favyen Bastani
Favyen Bastani
MIT CSAIL
Yawen Zhang
Yawen Zhang
Ai2
Machine LearningAI for Earth
Gabriel Tseng
Gabriel Tseng
Allen Institute for AI
Joseph Redmon
Joseph Redmon
Allen Institute for AI
H
Hadrien Sablon
Allen Institute for AI
Ryan Park
Ryan Park
Allen Institute for AI, University of Washington
Jacob Morrison
Jacob Morrison
Allen Institute for AI
natural language processing
A
Alexandra Buraczynski
Allen Institute for AI
K
Karen Farley
Allen Institute for AI
J
Joshua Hansen
Allen Institute for AI
A
Andrew Howe
Allen Institute for AI
P
Patrick Alan Johnson
Allen Institute for AI
M
Mark Otterlee
Allen Institute for AI
T
Ted Schmitt
Allen Institute for AI
H
Hunter Pitelka
Allen Institute for AI
S
Stephen Daspit
Allen Institute for AI
R
Rachel Ratner
Allen Institute for AI
C
Christopher Wilhelm
Allen Institute for AI
S
Sebastian Wood
Allen Institute for AI
M
Mike Jacobi
Allen Institute for AI
H
Hannah Kerner
Arizona State University
Evan Shelhamer
Evan Shelhamer
UBC / Vector Institute / CIFAR AI Chair
computer visionmachine learningdeep learning
A
Ali Farhadi
Allen Institute for AI, University of Washington
Ranjay Krishna
Ranjay Krishna
University of Washington, Allen Institute for AI
Computer VisionNatural Language ProcessingMachine LearningHuman Computer Interaction