🤖 AI Summary
Urban-scale traffic flow forecasting faces significant challenges due to sparse and biased observations, as well as the lack of principled uncertainty modeling; existing methods predominantly yield deterministic point estimates with limited generalizability. To address this, we propose the first pre-trained probabilistic Transformer framework designed for cross-city generalization: it models traffic flow as an aggregation of trajectory distributions and enables heterogeneous fusion of real-time observations, historical trajectories, and road network topology. Our approach adopts a “large-scale simulation pre-training + city-specific fine-tuning” paradigm, jointly achieving calibrated uncertainty quantification and scalable deployment. Evaluated on real-world datasets, our method substantially outperforms state-of-the-art baselines—particularly under extreme sparsity—delivering marked improvements in both predictive accuracy and probabilistic calibration.
📝 Abstract
City-scale traffic volume prediction plays a pivotal role in intelligent transportation systems, yet remains a challenge due to the inherent incompleteness and bias in observational data. Although deep learning-based methods have shown considerable promise, most existing approaches produce deterministic point estimates, thereby neglecting the uncertainty arising from unobserved traffic flows. Furthermore, current models are typically trained in a city-specific manner, which hinders their generalizability and limits scalability across diverse urban contexts. To overcome these limitations, we introduce TrafficPPT, a Pretrained Probabilistic Transformer designed to model traffic volume as a distributional aggregation of trajectories. Our framework fuses heterogeneous data sources-including real-time observations, historical trajectory data, and road network topology-enabling robust and uncertainty-aware traffic inference. TrafficPPT is initially pretrained on large-scale simulated data spanning multiple urban scenarios, and later fine-tuned on target cities to ensure effective domain adaptation. Experiments on real-world datasets show that TrafficPPT consistently surpasses state-of-the-art baselines, particularly under conditions of extreme data sparsity. Code will be open.