CaSTFormer: Causal Spatio-Temporal Transformer for Driving Intention Prediction

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address insufficient spatiotemporal dependency modeling and safety/interaction bottlenecks arising from human behavioral unpredictability in driving intention prediction, this paper proposes a causally enhanced dual-stream Transformer framework. Methodologically: (1) Reciprocal Shift Fusion precisely aligns driver state and environmental context in temporal sequence; (2) the Causal Pattern Extraction module eliminates spurious correlations and explicitly models causal spatiotemporal dependencies; (3) a Feature Synthesis Network enables adaptive fusion of disentangled representations. Evaluated on the Brain4Cars dataset, the framework achieves state-of-the-art performance with significantly improved accuracy. Moreover, it enhances model interpretability and decision transparency by grounding predictions in causal mechanisms. The approach thus establishes a robust and trustworthy foundation for intention prediction in human–machine collaborative driving systems.

Technology Category

Application Category

📝 Abstract

Accurate prediction of driving intention is key to enhancing the safety and interactive efficiency of human-machine co-driving systems. It serves as a cornerstone for achieving high-level autonomous driving. However, current approaches remain inadequate for accurately modeling the complex spatio-temporal interdependencies and the unpredictable variability of human driving behavior. To address these challenges, we propose CaSTFormer, a Causal Spatio-Temporal Transformer to explicitly model causal interactions between driver behavior and environmental context for robust intention prediction. Specifically, CaSTFormer introduces a novel Reciprocal Shift Fusion (RSF) mechanism for precise temporal alignment of internal and external feature streams, a Causal Pattern Extraction (CPE) module that systematically eliminates spurious correlations to reveal authentic causal dependencies, and an innovative Feature Synthesis Network (FSN) that adaptively synthesizes these purified representations into coherent spatio-temporal inferences. We evaluate the proposed CaSTFormer on the public Brain4Cars dataset, and it achieves state-of-the-art performance. It effectively captures complex causal spatio-temporal dependencies and enhances both the accuracy and transparency of driving intention prediction.

Problem

Research questions and friction points this paper is trying to address.

Model complex spatio-temporal dependencies in driving behavior

Eliminate spurious correlations for authentic causal relationships

Enhance accuracy and transparency of intention prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reciprocal Shift Fusion for temporal alignment

Causal Pattern Extraction to remove spurious correlations

Feature Synthesis Network for coherent inferences

🔎 Similar Papers

No similar papers found.