Causal Spatio-Temporal Prediction: An Effective and Efficient Multi-Modal Approach

๐Ÿ“… 2025-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In spatiotemporal forecasting for intelligent transportation, weather prediction, and urban planning, existing methods suffer from three key limitations: insufficient multimodal fusion, severe causal confounding, and high computational complexity. To address these, we propose a novel dual-branch causal inference architecture. Our approach introduces the first cross-modal attentionโ€“gated mechanism for collaborative modeling of heterogeneous data sources, and designs a GCN-Mamba hybrid encoder that jointly captures spatial dependencies and long-range temporal dynamics while preserving causal interpretability. Evaluated on four real-world benchmarks, our method consistently outperforms nine state-of-the-art models: it achieves up to 9.66% higher prediction accuracy and reduces inference latency by 17.37%โ€“56.11%. The framework thus delivers a principled trade-off among predictive performance, causal transparency, and computational efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Spatio-temporal prediction plays a crucial role in intelligent transportation, weather forecasting, and urban planning. While integrating multi-modal data has shown potential for enhancing prediction accuracy, key challenges persist: (i) inadequate fusion of multi-modal information, (ii) confounding factors that obscure causal relations, and (iii) high computational complexity of prediction models. To address these challenges, we propose E^2-CSTP, an Effective and Efficient Causal multi-modal Spatio-Temporal Prediction framework. E^2-CSTP leverages cross-modal attention and gating mechanisms to effectively integrate multi-modal data. Building on this, we design a dual-branch causal inference approach: the primary branch focuses on spatio-temporal prediction, while the auxiliary branch mitigates bias by modeling additional modalities and applying causal interventions to uncover true causal dependencies. To improve model efficiency, we integrate GCN with the Mamba architecture for accelerated spatio-temporal encoding. Extensive experiments on 4 real-world datasets show that E^2-CSTP significantly outperforms 9 state-of-the-art methods, achieving up to 9.66% improvements in accuracy as well as 17.37%-56.11% reductions in computational overhead.
Problem

Research questions and friction points this paper is trying to address.

Inadequate fusion of multi-modal information for prediction
Confounding factors obscuring causal relations in data
High computational complexity of spatio-temporal models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal attention for multi-modal fusion
Dual-branch causal inference approach
GCN-Mamba for efficient spatio-temporal encoding
๐Ÿ”Ž Similar Papers
No similar papers found.