Sanity Checking Causal Representation Learning on a Simple Real-World System

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Causal representation learning (CRL) promises to discover latent causal variables from observational data, yet its empirical validity in real-world physical systems remains largely untested. Method: We design a controllable optical experimental platform that provides ground-truth causal factors—a first-of-its-kind real-world benchmark—and systematically evaluate mainstream CRL algorithms (e.g., iVAE, Pairwise-IGSP) for reproducibility and robustness to theoretical assumptions. Combining controlled optical experiments with ablation studies on synthetic data, we assess recovery accuracy under varying levels of realism and model violation. Contribution/Results: None of the tested methods reliably recover the true causal factors—even on simplified synthetic data, most fail. Crucially, core theoretical assumptions—particularly invertibility of mixing functions—are empirically violated in practice, constituting a fundamental bottleneck. Our work exposes a substantial gap between CRL’s theoretical guarantees and real-world performance, providing critical empirical evidence and concrete directions for future methodological development.

Technology Category

Application Category

📝 Abstract
We evaluate methods for causal representation learning (CRL) on a simple, real-world system where these methods are expected to work. The system consists of a controlled optical experiment specifically built for this purpose, which satisfies the core assumptions of CRL and where the underlying causal factors (the inputs to the experiment) are known, providing a ground truth. We select methods representative of different approaches to CRL and find that they all fail to recover the underlying causal factors. To understand the failure modes of the evaluated algorithms, we perform an ablation on the data by substituting the real data-generating process with a simpler synthetic equivalent. The results reveal a reproducibility problem, as most methods already fail on this synthetic ablation despite its simple data-generating process. Additionally, we observe that common assumptions on the mixing function are crucial for the performance of some of the methods but do not hold in the real data. Our efforts highlight the contrast between the theoretical promise of the state of the art and the challenges in its application. We hope the benchmark serves as a simple, real-world sanity check to further develop and validate methodology, bridging the gap towards CRL methods that work in practice. We make all code and datasets publicly available at github.com/simonbing/CRLSanityCheck
Problem

Research questions and friction points this paper is trying to address.

Evaluating CRL methods on a real-world optical system.
Identifying failure modes in CRL algorithm reproducibility.
Challenging common assumptions in CRL mixing functions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Representation Learning methods
Controlled optical experiment
Synthetic data ablation