InstaGeo: Compute-Efficient Geospatial Machine Learning from Data to Deployment

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Geospatial foundation models (GFMs) face two critical deployment bottlenecks: lack of automated data processing and model bloat after fine-tuning. This paper proposes an end-to-end geospatial machine learning framework integrating automatic multispectral image annotation, unified data pipeline orchestration, task-aware knowledge distillation, and lightweight model architecture—designed for open-source remote sensing data (e.g., Landsat, Sentinel-2). Our approach reduces model size by 8× and significantly cuts carbon footprint while preserving or improving accuracy: crop segmentation achieves 60.65% mIoU—12 percentage points above state-of-the-art—and matches or exceeds baseline performance in flood mapping and desert locust forecasting. The entire workflow—from raw imagery to web-map integration—is completed within 24 hours, substantially enhancing GFMs’ practicality and real-world deployability.

Technology Category

Application Category

📝 Abstract
Open-access multispectral imagery from missions like Landsat 8-9 and Sentinel-2 has fueled the development of geospatial foundation models (GFMs) for humanitarian and environmental applications. Yet, their deployment remains limited by (i) the absence of automated geospatial data pipelines and (ii) the large size of fine-tuned models. Existing GFMs lack workflows for processing raw satellite imagery, and downstream adaptations often retain the full complexity of the original encoder. We present InstaGeo, an open-source, end-to-end framework that addresses these challenges by integrating: (1) automated data curation to transform raw imagery into model-ready datasets; (2) task-specific model distillation to derive compact, compute-efficient models; and (3) seamless deployment as interactive web-map applications. Using InstaGeo, we reproduced datasets from three published studies and trained models with marginal mIoU differences of -0.73 pp for flood mapping, -0.20 pp for crop segmentation, and +1.79 pp for desert locust prediction. The distilled models are up to 8x smaller than standard fine-tuned counterparts, reducing FLOPs and CO2 emissions with minimal accuracy loss. Leveraging InstaGeo's streamlined data pipeline, we also curated a larger crop segmentation dataset, achieving a state-of-the-art mIoU of 60.65%, a 12 pp improvement over prior baselines. Moreover, InstaGeo enables users to progress from raw data to model deployment within a single working day. By unifying data preparation, model compression, and deployment, InstaGeo transforms research-grade GFMs into practical, low-carbon tools for real-time, large-scale Earth observation. This approach shifts geospatial AI toward data quality and application-driven innovation. Source code, datasets, and model checkpoints are available at: https://github.com/instadeepai/InstaGeo-E2E-Geospatial-ML.git
Problem

Research questions and friction points this paper is trying to address.

Automating geospatial data pipelines for satellite imagery processing
Reducing model size and computational costs of geospatial AI
Streamlining deployment of efficient models for Earth observation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated data curation transforms raw imagery into datasets
Task-specific model distillation creates compact efficient models
Seamless deployment enables interactive web-map applications
🔎 Similar Papers
No similar papers found.
I
Ibrahim Salihu Yusuf
InstaDeep
I
Iffanice Houndayi
InstaDeep
R
Rym Oualha
InstaDeep
M
Mohamed Aziz Cherif
InstaDeep
K
Kobby Panford-Quainoo
InstaDeep
Arnu Pretorius
Arnu Pretorius
Staff Research Scientist, InstaDeep Ltd
Reinforcement LearningMulti-Agent Reinforcement Learning