π€ AI Summary
This study addresses the limitation of existing remote sensing foundation models, which predominantly rely on high-resolution imagery with low revisit rates and thus struggle to support real-time monitoring of rapidly evolving phenomena and sudden disasters. To overcome this, the authors present the first foundation model tailored for high-frequency Earth observation, built upon the SatMAE framework and trained on over 2 TB of high-temporal-resolution multispectral data from Meteosat Second Generationβs SEVIRI instrument. By incorporating fine-grained temporal encoding, the model enhances its capacity to capture short-term spatiotemporal dynamics. Evaluated on cloud masking and active fire detection tasks, the proposed approach significantly outperforms both conventional methods and current remote sensing foundation models, achieving a superior balance between accuracy and Intersection-over-Union (IoU), thereby demonstrating the potential of geostationary satellite data for real-time disaster monitoring.
π Abstract
The increasing frequency and severity of climate related disasters have intensified the need for real time monitoring, early warning, and informed decision-making. Earth Observation (EO), powered by satellite data and Machine Learning (ML), offers powerful tools to meet these challenges. Foundation Models (FMs) have revolutionized EO ML by enabling general-purpose pretraining on large scale remote sensing datasets. However most existing models rely on high-resolution satellite imagery with low revisit rates limiting their suitability for fast-evolving phenomena and time critical emergency response. In this work, we present HighFM, a first cut approach towards a FM for high temporal resolution, multispectral EO data. Leveraging over 2 TB of SEVIRI imagery from the Meteosat Second Generation (MSG) platform, we adapt the SatMAE masked autoencoding framework to learn robust spatiotemporal representations. To support real time monitoring, we enhance the original architecture with fine grained temporal encodings to capture short term variability. The pretrained models are then finetuned on cloud masking and active fire detection tasks. We benchmark our SEVIRI pretrained Vision Transformers against traditional baselines and recent geospatial FMs, demonstrating consistent gains across both balanced accuracy and IoU metrics. Our results highlight the potential of temporally dense geostationary data for real-time EO, offering a scalable path toward foundation models for disaster detection and tracking.