🤖 AI Summary
Accurate inference of point-of-interest (POI) crowd flow is critical for urban governance, yet real-world sensing data are often sparse, noisy, and severely limited in labeled samples. To address these challenges, we propose a self-supervised attributed graph representation learning framework for POI crowd flow prediction. First, we construct a spatial adjacency graph to model topological relationships among POIs. Second, we design a swap-based subgraph contrastive learning mechanism that leverages large-scale unlabeled spatiotemporal data for pretraining, explicitly capturing spatiotemporal similarity and multi-source correlations inherent in GPS reports. Finally, we jointly optimize the model via masked reconstruction and downstream fine-tuning. Experiments on two real-world datasets demonstrate that our method significantly outperforms supervised baselines, achieving substantial gains in prediction accuracy and generalization—particularly under low-quality sensing and scarce labeling conditions. This work establishes a novel paradigm for spatiotemporal graph modeling under label scarcity.
📝 Abstract
Accurate acquisition of crowd flow at Points of Interest (POIs) is pivotal for effective traffic management, public service, and urban planning. Despite this importance, due to the limitations of urban sensing techniques, the data quality from most sources is inadequate for monitoring crowd flow at each POI. This renders the inference of accurate crowd flow from low-quality data a critical and challenging task. The complexity is heightened by three key factors: 1) emph{The scarcity and rarity of labeled data}, 2) emph{The intricate spatio-temporal dependencies among POIs}, and 3) emph{The myriad correlations between precise crowd flow and GPS reports}.
To address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel underline{C}ontrastive underline{S}elf-learning framework for underline{S}patio-underline{T}emporal data (model). Our approach initiates with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the model pre-trained on extensive noisy data consistently outperforms models trained from scratch.