CausalProfiler: Generating Synthetic Benchmarks for Rigorous and Transparent Evaluation of Causal Machine Learning

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current causal machine learning (CML) evaluation relies heavily on a few hand-crafted or semi-synthetic datasets, yielding fragile conclusions and poor generalizability. To address this, we propose the first randomized synthetic causal benchmark generation framework with formal coverage guarantees and transparent assumptions, grounded in structural causal models (SCMs). Our framework enables systematic evaluation across observational, interventional, and counterfactual inference layers. It supports flexible control over causal graph structure, noise mechanisms, identifiability conditions, and query types—automatically generating diverse, ground-truth-known evaluation scenarios. Experiments demonstrate that our benchmark effectively exposes performance disparities among state-of-the-art CML methods under varying data distributions and causal structures. It facilitates reproducible, diagnostic, cross-setting comparisons, thereby substantially enhancing the rigor, interpretability, and scalability of causal ML evaluation.

Technology Category

Application Category

📝 Abstract
Causal machine learning (Causal ML) aims to answer "what if" questions using machine learning algorithms, making it a promising tool for high-stakes decision-making. Yet, empirical evaluation practices in Causal ML remain limited. Existing benchmarks often rely on a handful of hand-crafted or semi-synthetic datasets, leading to brittle, non-generalizable conclusions. To bridge this gap, we introduce CausalProfiler, a synthetic benchmark generator for Causal ML methods. Based on a set of explicit design choices about the class of causal models, queries, and data considered, the CausalProfiler randomly samples causal models, data, queries, and ground truths constituting the synthetic causal benchmarks. In this way, Causal ML methods can be rigorously and transparently evaluated under a variety of conditions. This work offers the first random generator of synthetic causal benchmarks with coverage guarantees and transparent assumptions operating on the three levels of causal reasoning: observation, intervention, and counterfactual. We demonstrate its utility by evaluating several state-of-the-art methods under diverse conditions and assumptions, both in and out of the identification regime, illustrating the types of analyses and insights the CausalProfiler enables.
Problem

Research questions and friction points this paper is trying to address.

Generates synthetic benchmarks for evaluating causal machine learning methods
Addresses limitations of existing non-generalizable, hand-crafted causal datasets
Enables rigorous, transparent evaluation across observation, intervention, and counterfactual reasoning levels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic causal benchmarks with random sampling
Provides coverage guarantees across three causal reasoning levels
Enables rigorous evaluation under diverse conditions and assumptions
🔎 Similar Papers
No similar papers found.