π€ AI Summary
In causal inference, observational data are often compatible with multiple causal graphs, leading to structural uncertainty; while Bayesian model averaging (BMA) is theoretically principled, its exact computation is intractable due to the super-exponential growth of the causal graph space. To address this, we propose MACE-TNPβthe first framework integrating meta-learning with Transformer-based neural processes (TNPs) to end-to-end approximate BMA over causal structures and directly predict intervention posterior distributions. Crucially, MACE-TNP bypasses explicit causal discovery and graph enumeration, dramatically improving scalability and inference efficiency. Experiments demonstrate that MACE-TNP significantly outperforms strong baselines in interventional effect estimation, while exhibiting robust generalization across datasets and unseen mechanisms. This work establishes a novel, efficient, and scalable paradigm for causal inference under high-dimensional settings and structural uncertainty.
π Abstract
In scientific domains -- from biology to the social sciences -- many questions boil down to extit{What effect will we observe if we intervene on a particular variable?} If the causal relationships (e.g.~a causal graph) are known, it is possible to estimate the intervention distributions. In the absence of this domain knowledge, the causal structure must be discovered from the available observational data. However, observational data are often compatible with multiple causal graphs, making methods that commit to a single structure prone to overconfidence. A principled way to manage this structural uncertainty is via Bayesian inference, which averages over a posterior distribution on possible causal structures and functional mechanisms. Unfortunately, the number of causal structures grows super-exponentially with the number of nodes in the graph, making computations intractable. We propose to circumvent these challenges by using meta-learning to create an end-to-end model: the Model-Averaged Causal Estimation Transformer Neural Process (MACE-TNP). The model is trained to predict the Bayesian model-averaged interventional posterior distribution, and its end-to-end nature bypasses the need for expensive calculations. Empirically, we demonstrate that MACE-TNP outperforms strong Bayesian baselines. Our work establishes meta-learning as a flexible and scalable paradigm for approximating complex Bayesian causal inference, that can be scaled to increasingly challenging settings in the future.