🤖 AI Summary
Addressing two key challenges in time-series causal discovery—modeling nonlinear dependencies and mitigating spurious correlations—this paper proposes a prior-augmented multi-layer Transformer framework. Methodologically: (1) it introduces an attention masking mechanism that uniformly enforces causal exclusion constraints across all Transformer layers, explicitly encoding domain knowledge to suppress spurious causal relations; (2) it jointly estimates causal direction and temporal lag via gradient-based analysis of the predictive model. The core innovation lies in embedding structural priors into the entire attention computation process, enabling end-to-end co-learning of causal graphs and temporal dynamics. Evaluated on standard benchmarks, the method achieves a 12.8% improvement in causal discovery F1-score and attains 98.9% accuracy in causal lag estimation, significantly outperforming existing state-of-the-art approaches.
📝 Abstract
We introduce a novel framework for temporal causal discovery and inference that addresses two key challenges: complex nonlinear dependencies and spurious correlations. Our approach employs a multi-layer Transformer-based time-series forecaster to capture long-range, nonlinear temporal relationships among variables. After training, we extract the underlying causal structure and associated time lags from the forecaster using gradient-based analysis, enabling the construction of a causal graph. To mitigate the impact of spurious causal relationships, we introduce a prior knowledge integration mechanism based on attention masking, which consistently enforces user-excluded causal links across multiple Transformer layers. Extensive experiments show that our method significantly outperforms other state-of-the-art approaches, achieving a 12.8% improvement in F1-score for causal discovery and 98.9% accuracy in estimating causal lags.