🤖 AI Summary
This study addresses the challenge of unsupervised discovery of unknown causal relationships in large-scale randomized controlled trials. We propose Neural Effect Search (NES), a novel framework that integrates pretrained foundation model representation learning, sparse autoencoder-based feature disentanglement, and recursive hierarchical effect search to systematically mitigate issues of multiple testing and confounding entanglement. Methodologically, NES is the first to enable end-to-end, hypothesis-free automatic identification of causal effects in real scientific experiments. Evaluated on semi-synthetic data and empirically validated in experimental ecology, NES successfully detects strong causal effects overlooked by conventional hypothesis-driven approaches—achieving high precision while demonstrating superior robustness and scalability. The framework establishes a generalizable, unsupervised paradigm for causal discovery.
📝 Abstract
Randomized Controlled Trials are one of the pillars of science; nevertheless, they rely on hand-crafted hypotheses and expensive analysis. Such constraints prevent causal effect estimation at scale, potentially anchoring on popular yet incomplete hypotheses. We propose to discover the unknown effects of a treatment directly from data. For this, we turn unstructured data from a trial into meaningful representations via pretrained foundation models and interpret them via a sparse autoencoder. However, discovering significant causal effects at the neural level is not trivial due to multiple-testing issues and effects entanglement. To address these challenges, we introduce Neural Effect Search, a novel recursive procedure solving both issues by progressive stratification. After assessing the robustness of our algorithm on semi-synthetic experiments, we showcase, in the context of experimental ecology, the first successful unsupervised causal effect identification on a real-world scientific trial.