Unitless Unrestricted Markov-Consistent SCM Generation: Better Benchmark Datasets for Causal Discovery

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing causal discovery algorithms are predominantly evaluated on synthetic data, yet mainstream generation methods introduce non-physical artifacts—particularly *var*- and *R²*-sortability—where variable variances or determination coefficients become orderable along the causal chain, distorting benchmarking and inflating estimates of real-world generalizability. Method: We propose the first dimensionless, unconstrained, and Markov-consistent structural causal model (SCM) generation framework. It eliminates sortability artifacts via coefficient-space resampling, internal standardization for bias removal, adaptive sparse/dense graph design, and extension to time-series SCMs. Contribution/Results: We conduct the first empirical analysis of sortability patterns in real-world data and demonstrate that our framework completely avoids the reversed *R²*-sorting artifact inherent in interventional SCMs (iSCMs) under dense graphs. The resulting benchmark significantly improves the fidelity and generalization reliability of causal discovery algorithm evaluation.

Technology Category

Application Category

📝 Abstract

Causal discovery aims to extract qualitative causal knowledge in the form of causal graphs from data. Because causal ground truth is rarely known in the real world, simulated data plays a vital role in evaluating the performance of the various causal discovery algorithms proposed in the literature. But recent work highlighted certain artifacts of commonly used data generation techniques for a standard class of structural causal models (SCM) that may be nonphysical, including var- and R2-sortability, where the variables' variance and coefficients of determination (R2) after regressing on all other variables, respectively, increase along the causal order. Some causal methods exploit such artifacts, leading to unrealistic expectations for their performance on real-world data. Some modifications have been proposed to remove these artifacts; notably, the internally-standardized structural causal model (iSCM) avoids varsortability and largely alleviates R2-sortability on sparse causal graphs, but exhibits a reversed R2-sortability pattern for denser graphs not featured in their work. We analyze which sortability patterns we expect to see in real data, and propose a method for drawing coefficients that we argue more effectively samples the space of SCMs. Finally, we propose a novel extension of our SCM generation method to the time series setting.

Problem

Research questions and friction points this paper is trying to address.

Eliminating unrealistic artifacts in SCM generation for causal discovery

Addressing var- and R2-sortability issues in benchmark datasets

Extending SCM generation method to time series setting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates unrestricted Markov-consistent SCMs

Avoids var- and R2-sortability artifacts

Extends SCM generation to time series

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Monetization AI