Transforming Causality: Transformer-Based Temporal Causal Discovery with Prior Knowledge Integration

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Addressing two key challenges in time-series causal discovery—modeling nonlinear dependencies and mitigating spurious correlations—this paper proposes a prior-augmented multi-layer Transformer framework. Methodologically: (1) it introduces an attention masking mechanism that uniformly enforces causal exclusion constraints across all Transformer layers, explicitly encoding domain knowledge to suppress spurious causal relations; (2) it jointly estimates causal direction and temporal lag via gradient-based analysis of the predictive model. The core innovation lies in embedding structural priors into the entire attention computation process, enabling end-to-end co-learning of causal graphs and temporal dynamics. Evaluated on standard benchmarks, the method achieves a 12.8% improvement in causal discovery F1-score and attains 98.9% accuracy in causal lag estimation, significantly outperforming existing state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

We introduce a novel framework for temporal causal discovery and inference that addresses two key challenges: complex nonlinear dependencies and spurious correlations. Our approach employs a multi-layer Transformer-based time-series forecaster to capture long-range, nonlinear temporal relationships among variables. After training, we extract the underlying causal structure and associated time lags from the forecaster using gradient-based analysis, enabling the construction of a causal graph. To mitigate the impact of spurious causal relationships, we introduce a prior knowledge integration mechanism based on attention masking, which consistently enforces user-excluded causal links across multiple Transformer layers. Extensive experiments show that our method significantly outperforms other state-of-the-art approaches, achieving a 12.8% improvement in F1-score for causal discovery and 98.9% accuracy in estimating causal lags.

Problem

Research questions and friction points this paper is trying to address.

Discovering temporal causal relationships among time-series variables

Addressing complex nonlinear dependencies and spurious correlations

Integrating prior knowledge to exclude invalid causal links

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based time-series forecaster for causal discovery

Gradient-based analysis extracts causal structure and lags

Attention masking integrates prior knowledge to reduce spurious links

🔎 Similar Papers

Large Language Models for Causal Discovery: Current Landscape and Future Directions