🤖 AI Summary
Existing traffic forecasting models often rely on explicit graph structures or complex spatiotemporal attention mechanisms, resulting in high parameter counts and computational overhead, hindering practical deployment. To address this, we propose M3-Net—a lightweight, graph-free, pure MLP architecture. Methodologically, M3-Net innovatively integrates the Mixture of Experts (MoE) mechanism into the MLP-Mixer framework, augmented with temporal and spatiotemporal embeddings to enable efficient and scalable spatiotemporal modeling—entirely without graph neural networks. This design drastically reduces both parameter count and computational cost. Empirically, M3-Net achieves superior prediction accuracy with significantly fewer parameters across multiple real-world traffic datasets. It demonstrates strong generalization capability and low inference latency. Overall, M3-Net provides an efficient, end-to-end, and production-ready solution for large-scale intelligent transportation systems.
📝 Abstract
Achieving accurate traffic prediction is a fundamental but crucial task in the development of current intelligent transportation systems.Most of the mainstream methods that have made breakthroughs in traffic prediction rely on spatio-temporal graph neural networks, spatio-temporal attention mechanisms, etc. The main challenges of the existing deep learning approaches are that they either depend on a complete traffic network structure or require intricate model designs to capture complex spatio-temporal dependencies. These limitations pose significant challenges for the efficient deployment and operation of deep learning models on large-scale datasets. To address these challenges, we propose a cost-effective graph-free Multilayer Perceptron (MLP) based model M3-Net for traffic prediction. Our proposed model not only employs time series and spatio-temporal embeddings for efficient feature processing but also first introduces a novel MLP-Mixer architecture with a mixture of experts (MoE) mechanism. Extensive experiments conducted on multiple real datasets demonstrate the superiority of the proposed model in terms of prediction performance and lightweight deployment.