GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurate inference of point-of-interest (POI) crowd flow is critical for urban governance, yet real-world sensing data are often sparse, noisy, and severely limited in labeled samples. To address these challenges, we propose a self-supervised attributed graph representation learning framework for POI crowd flow prediction. First, we construct a spatial adjacency graph to model topological relationships among POIs. Second, we design a swap-based subgraph contrastive learning mechanism that leverages large-scale unlabeled spatiotemporal data for pretraining, explicitly capturing spatiotemporal similarity and multi-source correlations inherent in GPS reports. Finally, we jointly optimize the model via masked reconstruction and downstream fine-tuning. Experiments on two real-world datasets demonstrate that our method significantly outperforms supervised baselines, achieving substantial gains in prediction accuracy and generalization—particularly under low-quality sensing and scarce labeling conditions. This work establishes a novel paradigm for spatiotemporal graph modeling under label scarcity.

Technology Category

Application Category

📝 Abstract
Accurate acquisition of crowd flow at Points of Interest (POIs) is pivotal for effective traffic management, public service, and urban planning. Despite this importance, due to the limitations of urban sensing techniques, the data quality from most sources is inadequate for monitoring crowd flow at each POI. This renders the inference of accurate crowd flow from low-quality data a critical and challenging task. The complexity is heightened by three key factors: 1) emph{The scarcity and rarity of labeled data}, 2) emph{The intricate spatio-temporal dependencies among POIs}, and 3) emph{The myriad correlations between precise crowd flow and GPS reports}. To address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel underline{C}ontrastive underline{S}elf-learning framework for underline{S}patio-underline{T}emporal data (model). Our approach initiates with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the model pre-trained on extensive noisy data consistently outperforms models trained from scratch.
Problem

Research questions and friction points this paper is trying to address.

Inferring accurate crowd flow from low-quality data
Addressing scarcity of labeled spatio-temporal data
Modeling complex spatio-temporal dependencies among POIs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive self-learning for spatio-temporal data
Spatial adjacency graph based on POI distances
Swapped prediction for target subgraph representation
🔎 Similar Papers
No similar papers found.
S
Songyu Ke
College of Computer and Data Science, Fuzhou University, Fuzhou, China; JD Intelligent Cities Research, Beijing, China; JD iCity, JD Technology, Beijing, China
Chenyu Wu
Chenyu Wu
Tsinghua University
Turbulence modelingmachine learning
Yuxuan Liang
Yuxuan Liang
Assistant Professor, Hong Kong University of Science and Technology (Guangzhou)
Spatio-Temporal Data MiningUrban ComputingUrban AIFoundation ModelsTime Series
X
Xiuwen Yi
JD iCity, JD Technology, Beijing, China
Y
Yanping Sun
JD iCity, JD Technology, Beijing, China
J
Junbo Zhang
JD iCity, JD Technology, Beijing, China; School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Y
Yu Zheng
JD iCity, JD Technology, Beijing, China; School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China