Beyond AlphaEarth: Toward Human-Centered Spatial Representation via POI-Guided Contrastive Learning

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing geospatial foundation models—such as AlphaEarth (AE)—effectively learn physical surface representations from global 10-meter-resolution remote sensing imagery but struggle to capture urban functional and socioeconomic semantics. To address this gap, we propose AETHER, the first framework that integrates Point-of-Interest (POI) textual semantics into geospatial foundation model pretraining via lightweight multimodal alignment. AETHER employs POI-guided contrastive learning to jointly model remote sensing features and human activity semantics. Its methodology unifies contrastive learning, cross-modal representation alignment, and spatial encoder fine-tuning—preserving model scalability while enabling joint encoding of surface morphology and urban functionality. Experiments over Greater London demonstrate significant improvements: +7.2% in land-use classification F1-score and −23.6% reduction in KL divergence for socioeconomic distribution estimation, substantially outperforming the AE baseline. AETHER thus bridges a critical semantic gap in purely remote sensing–driven geospatial understanding.

Technology Category

Application Category

📝 Abstract
General-purpose spatial representations are essential for building transferable geospatial foundation models (GFMs). Among them, the AlphaEarth Foundation (AE) represents a major step toward a global, unified representation of the Earth's surface, learning 10-meter embeddings from multi-source Earth Observation (EO) data that capture rich physical and environmental patterns across diverse landscapes. However, such EO-driven representations remain limited in capturing the functional and socioeconomic dimensions of cities, as they primarily encode physical and spectral patterns rather than human activities or spatial functions. We propose AETHER (AlphaEarth-POI Enriched Representation Learning), a lightweight framework that adapts AlphaEarth to human-centered urban analysis through multimodal alignment guided by Points of Interest (POIs). AETHER aligns AE embeddings with textual representations of POIs, enriching physically grounded EO features with semantic cues about urban functions and socioeconomic contexts. In Greater London, AETHER achieves consistent gains over the AE baseline, with a 7.2% relative improvement in land-use classification F1 and a 23.6% relative reduction in Kullback-Leibler divergence for socioeconomic mapping. Built upon pretrained AE, AETHER leverages a lightweight multimodal alignment to enrich it with human-centered semantics while remaining computationally efficient and scalable for urban applications. By coupling EO with human-centered semantics, it advances geospatial foundation models toward general-purpose urban representations that integrate both physical form and functional meaning.
Problem

Research questions and friction points this paper is trying to address.

Enriching EO spatial representations with human-centered urban semantics
Aligning physical landscape features with socioeconomic functional patterns
Integrating POI-guided multimodal learning into geospatial foundation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

POI-guided contrastive learning enriches spatial representations
Multimodal alignment integrates physical features with urban semantics
Lightweight framework enhances geospatial models with socioeconomic contexts
🔎 Similar Papers
No similar papers found.
J
Junyuan Liu
SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College London, London, WC1E 6BT, United Kingdom
Q
Quan Qin
SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College London, London, WC1E 6BT, United Kingdom; School of Resource and Environmental Sciences, Wuhan University, Wuhan, 430079, China
G
Guangsheng Dong
SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College London, London, WC1E 6BT, United Kingdom; State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China
Xinglei Wang
Xinglei Wang
PhD Student, University College London
GIScienceHuman mobilityUrban analyticsSpatio-temporal data mining
J
Jiazhuang Feng
SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College London, London, WC1E 6BT, United Kingdom
Z
Zichao Zeng
SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College London, London, WC1E 6BT, United Kingdom
Tao Cheng
Tao Cheng
Professor in GeoInformatics, University College London
Geographical Information ScienceSpace-Time AnalyticsSmart CitiesGeoComputationNetwork Complexity