CausalCompass: Evaluating the Robustness of Time-Series Causal Discovery in Misspecified Scenarios

📅 2026-02-08

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This study addresses the limited robustness of existing time series causal discovery methods under violations of modeling assumptions and the absence of a systematic evaluation benchmark. To this end, we propose CausalCompass—the first scalable evaluation framework tailored for scenarios involving assumption misspecification, encompassing eight canonical violation types. The framework integrates multiple representative algorithms and conducts comprehensive assessments incorporating both standardized and non-standardized preprocessing pipelines alongside hyperparameter sensitivity analyses. Experimental results demonstrate that no single method consistently outperforms others across all settings; however, deep learning–based approaches generally exhibit greater robustness, with NTS-NOTEARS showing pronounced sensitivity to standardization. This work establishes a systematic benchmark and offers empirical insights to advance research on robustness in time series causal discovery.

Technology Category

Application Category

📝 Abstract

Causal discovery from time series is a fundamental task in machine learning. However, its widespread adoption is hindered by a reliance on untestable causal assumptions and by the lack of robustness-oriented evaluation in existing benchmarks. To address these challenges, we propose CausalCompass, a flexible and extensible benchmark suite designed to assess the robustness of time-series causal discovery (TSCD) methods under violations of modeling assumptions. To demonstrate the practical utility of CausalCompass, we conduct extensive benchmarking of representative TSCD algorithms across eight assumption-violation scenarios. Our experimental results indicate that no single method consistently attains optimal performance across all settings. Nevertheless, the methods exhibiting superior overall performance across diverse scenarios are almost invariably deep learning-based approaches. We further provide hyperparameter sensitivity analyses to deepen the understanding of these findings. We also find, somewhat surprisingly, that NTS-NOTEARS relies heavily on standardized preprocessing in practice, performing poorly in the vanilla setting but exhibiting strong performance after standardization. Finally, our work aims to provide a comprehensive and systematic evaluation of TSCD methods under assumption violations, thereby facilitating their broader adoption in real-world applications. The code and datasets are available at https://github.com/huiyang-yi/CausalCompass.

Problem

Research questions and friction points this paper is trying to address.

causal discovery

time series

robustness

assumption violation

benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal discovery

time series

robustness evaluation