🤖 AI Summary
To address the challenges of modeling multi-timescale user mobility patterns, insufficient integration of global and local temporal dependencies, and the trade-off between prediction accuracy and computational efficiency, this paper proposes a dual-scale collaborative modeling architecture. It employs parallel MLPs to capture long-term trends and a multi-scale convolutional neural network (MSCNN) to extract short-term dynamic patterns, augmented by a cross-attention mechanism for adaptive inter-scale feature fusion. This design enhances representation capability for complex spatiotemporal dependencies. On a 12-step trajectory forecasting task, the method reduces mean squared error by 5.04% and mean absolute error by 4.35% compared to ModernTCN, while maintaining comparable inference latency. The core contribution lies in a lightweight, interpretable dual-scale fusion paradigm that jointly optimizes modeling expressiveness and computational efficiency.
📝 Abstract
Trajectory prediction is essential for formulating proactive strategies that anticipate user mobility and support advance preparation. Therefore, how to reduce the forecasting error in user trajectory prediction within an acceptable inference time arises as an interesting issue. However, trajectory data contains both global and local temporal information, complicating the extraction of the complete temporal pattern. Moreover, user behavior occurs over different time scales, increasing the difficulty of capturing behavioral patterns. To address these challenges, a trajectory prediction model based on multilayer perceptron (MLP), multi-scale convolutional neural network (MSCNN), and cross-attention (CA) is proposed. Specifically, MLP is used to extract the global temporal information of each feature. In parallel, MSCNN is employed to extract the local temporal information by modeling interactions among features within a local temporal range. Convolutional kernels with different sizes are used in MSCNN to capture temporal information at multiple resolutions, enhancing the model's adaptability to different behavioral patterns. Finally, CA is applied to fuse the global and local temporal information. Experimental results show that our model reduces mean squared error (MSE) by 5.04% and mean absolute error (MAE) by 4.35% compared with ModernTCN in 12-step prediction, while maintaining similar inference time.