GeoLink: Empowering Remote Sensing Foundation Model with OpenStreetMap Data

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A significant modality gap exists between remote sensing imagery and vector geospatial data (e.g., OpenStreetMap)—stemming from structural heterogeneity, semantic granularity mismatch, and spatial misalignment. Method: We propose a multimodal remote sensing foundation model framework featuring a unified encoder that jointly models cross-modal spatial correlations, employs self-supervised contrastive learning, and integrates sparse masked reconstruction—using OSM vector data as weak supervision for remote sensing image pretraining. Additionally, we incorporate multi-granularity geographic priors to enable joint pixel-level and object-level representation learning. Contribution/Results: This work presents the first systematic end-to-end co-pretraining framework for remote sensing imagery and OSM. It achieves substantial downstream performance gains: average mIoU improvements of 4.2% on land cover classification and urban functional zoning tasks, along with enhanced generalization in complex urban scenes—establishing a scalable, multimodal foundation model for geospatial intelligence.

Technology Category

Application Category

📝 Abstract
Integrating ground-level geospatial data with rich geographic context, like OpenStreetMap (OSM), into remote sensing (RS) foundation models (FMs) is essential for advancing geospatial intelligence and supporting a broad spectrum of tasks. However, modality gap between RS and OSM data, including differences in data structure, content, and spatial granularity, makes effective synergy highly challenging, and most existing RS FMs focus on imagery alone. To this end, this study presents GeoLink, a multimodal framework that leverages OSM data to enhance RS FM during both the pretraining and downstream task stages. Specifically, GeoLink enhances RS self-supervised pretraining using multi-granularity learning signals derived from OSM data, guided by cross-modal spatial correlations for information interaction and collaboration. It also introduces image mask-reconstruction to enable sparse input for efficient pretraining. For downstream tasks, GeoLink generates both unimodal and multimodal fine-grained encodings to support a wide range of applications, from common RS interpretation tasks like land cover classification to more comprehensive geographic tasks like urban function zone mapping. Extensive experiments show that incorporating OSM data during pretraining enhances the performance of the RS image encoder, while fusing RS and OSM data in downstream tasks improves the FM's adaptability to complex geographic scenarios. These results underscore the potential of multimodal synergy in advancing high-level geospatial artificial intelligence. Moreover, we find that spatial correlation plays a crucial role in enabling effective multimodal geospatial data integration. Code, checkpoints, and using examples are released at https://github.com/bailubin/GeoLink_NeurIPS2025
Problem

Research questions and friction points this paper is trying to address.

Bridging modality gap between remote sensing and OpenStreetMap data
Enhancing foundation models with multi-granularity geospatial learning signals
Improving adaptability to complex geographic scenarios through multimodal fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages OpenStreetMap data for remote sensing pretraining
Uses cross-modal spatial correlations for information interaction
Generates multimodal encodings for downstream task adaptability
🔎 Similar Papers
No similar papers found.
L
Lubian Bai
School of Earth and Space Sciences, Peking University, Beijing, China
X
Xiuyuan Zhang
College of Urban and Environmental Sciences, Peking University, Beijing, China
S
Siqi Zhang
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS, Beijing, China
Zepeng Zhang
Zepeng Zhang
EPFL
Machine LearningGraph Neural Network
H
Haoyu Wang
College of Urban and Environmental Sciences, Peking University, Beijing, China
Wei Qin
Wei Qin
Xidian University
Brain StimulationBiomedical Signal Processing
S
Shihong Du
College of Urban and Environmental Sciences, Peking University, Beijing, China