🤖 AI Summary
Existing GNN- and contrastive learning–based approaches for road network representation struggle to jointly capture spatial heterogeneity and temporal dynamics, primarily due to the oversimplified neighborhood smoothing assumption. To address this, we propose DST, a dual-branch spatiotemporal self-supervised framework. First, DST constructs three types of hyperedges to form a hypergraph, enabling hypergraph contrastive learning to model complex, heterogeneous spatial relationships. Second, it designs a hybrid hop-aware transition matrix to encode trajectory dynamics and incorporates a causal Transformer—distinguished by weekday/weekend contexts—for time-series forecasting, thereby achieving traffic-pattern-aware zero-shot generalization. Evaluated on multiple benchmarks, DST significantly outperforms state-of-the-art methods, especially under zero-shot settings, demonstrating superior robustness. These results validate both the effectiveness and necessity of joint spatiotemporal modeling for road representation learning.
📝 Abstract
Road network representation learning (RNRL) has attracted increasing attention from both researchers and practitioners as various spatiotemporal tasks are emerging. Recent advanced methods leverage Graph Neural Networks (GNNs) and contrastive learning to characterize the spatial structure of road segments in a self-supervised paradigm. However, spatial heterogeneity and temporal dynamics of road networks raise severe challenges to the neighborhood smoothing mechanism of self-supervised GNNs. To address these issues, we propose a $ extbf{D}$ual-branch $ extbf{S}$patial-$ extbf{T}$emporal self-supervised representation framework for enhanced road representations, termed as DST. On one hand, DST designs a mix-hop transition matrix for graph convolution to incorporate dynamic relations of roads from trajectories. Besides, DST contrasts road representations of the vanilla road network against that of the hypergraph in a spatial self-supervised way. The hypergraph is newly built based on three types of hyperedges to capture long-range relations. On the other hand, DST performs next token prediction as the temporal self-supervised task on the sequences of traffic dynamics based on a causal Transformer, which is further regularized by differentiating traffic modes of weekdays from those of weekends. Extensive experiments against state-of-the-art methods verify the superiority of our proposed framework. Moreover, the comprehensive spatiotemporal modeling facilitates DST to excel in zero-shot learning scenarios.