CausalRivers -- Scaling up benchmarking of causal discovery for real-world time-series

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing causal discovery methods predominantly rely on synthetic data and lack systematic evaluation on real-world complex temporal systems—such as hydrological dynamics—where distribution shifts and spatial dependencies are critical. Method: We introduce the largest real-world time-series causal discovery benchmark to date, comprising 15-minute-resolution observations from 1,160 hydrological stations across eastern Germany and Bavaria (2019–2023), augmented with distribution-shift samples induced by the Elbe River flood event. We construct, for the first time, dual-region ground-truth causal graphs grounded in multi-source in-situ measurements and geographic metadata, enabling scalable subgraph sampling. Our framework integrates hydrological domain knowledge, spatial priors, event-driven modeling, and graph-generation techniques to achieve high ecological validity in characterizing dynamic causal structures. Contribution/Results: We conduct a unified, reproducible evaluation of state-of-the-art algorithms on this benchmark, exposing their performance bottlenecks in realistic settings and delivering the first scalable, high-fidelity, and publicly accessible time-series causal discovery benchmark platform.

Technology Category

Application Category

📝 Abstract
Causal discovery, or identifying causal relationships from observational data, is a notoriously challenging task, with numerous methods proposed to tackle it. Despite this, in-the-wild evaluation of these methods is still lacking, as works frequently rely on synthetic data evaluation and sparse real-world examples under critical theoretical assumptions. Real-world causal structures, however, are often complex, making it hard to decide on a proper causal discovery strategy. To bridge this gap, we introduce CausalRivers, the largest in-the-wild causal discovery benchmarking kit for time-series data to date. CausalRivers features an extensive dataset on river discharge that covers the eastern German territory (666 measurement stations) and the state of Bavaria (494 measurement stations). It spans the years 2019 to 2023 with a 15-minute temporal resolution. Further, we provide additional data from a flood around the Elbe River, as an event with a pronounced distributional shift. Leveraging multiple sources of information and time-series meta-data, we constructed two distinct causal ground truth graphs (Bavaria and eastern Germany). These graphs can be sampled to generate thousands of subgraphs to benchmark causal discovery across diverse and challenging settings. To demonstrate the utility of CausalRivers, we evaluate several causal discovery approaches through a set of experiments to identify areas for improvement. CausalRivers has the potential to facilitate robust evaluations and comparisons of causal discovery methods. Besides this primary purpose, we also expect that this dataset will be relevant for connected areas of research, such as time-series forecasting and anomaly detection. Based on this, we hope to push benchmark-driven method development that fosters advanced techniques for causal discovery, as is the case for many other areas of machine learning.
Problem

Research questions and friction points this paper is trying to address.

Lack of real-world evaluation for causal discovery methods
Complex real-world causal structures challenge strategy selection
Need for large-scale benchmarking in time-series causal discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest in-the-wild time-series causal benchmarking
Extensive river discharge dataset with ground truth
Diverse subgraphs for robust method evaluation
🔎 Similar Papers
No similar papers found.