🤖 AI Summary
This work addresses the challenges of modeling multimodal, multi-temporal Earth observation data with variable input lengths and the failure of existing foundation models in temporal tasks such as natural disaster risk prediction. To overcome these limitations, we propose TerraFlow, a novel approach that introduces a temporally oriented training objective to jointly model spatial, temporal, and modal dimensions. TerraFlow is the first method to enable unified representation learning for variable-length multimodal Earth observation sequences through a sequence-aware architecture that effectively fuses multimodal information and captures temporal dynamics. Evaluated on the GEO-Bench-2 benchmark, TerraFlow consistently outperforms current state-of-the-art models across all temporal tasks, achieving up to a 50% improvement in F1 score and a 24% reduction in Brier score, thereby resolving the catastrophic failures of prior methods in disaster risk mapping.
📝 Abstract
We propose TerraFlow, a novel approach to multimodal, multitemporal learning for Earth observation. TerraFlow builds on temporal training objectives that enable sequence-aware learning across space, time, and modality, while remaining robust to the variable-length inputs commonly encountered in real-world Earth observation data. Our experiments demonstrate superiority of TerraFlow over state-of-the-art foundation models for Earth observation across all temporal tasks of the GEO-Bench-2 benchmark. We additionally demonstrate that TerraFlow is able to make initial steps towards deep-learning based risk map prediction for natural disasters -- a task on which other state-of-the-art foundation models frequently collapse. TerraFlow outperforms state-of-the-art foundation models by up to 50% in F1 score and 24% in Brier score.