🤖 AI Summary
Traditional data-driven astrophysics relies on statistical correlation tests to validate physical theories but cannot identify direct causal relationships, causal directions, or unobserved confounders. To address this limitation, we propose a causality-adapted discovery algorithm tailored to astronomical data, integrating conditional independence testing and constraint-based structural learning. Applied to ~500,000 low-redshift galaxies from the NASA-Sloan Atlas, the method constructs a robust causal network. This represents the first galaxy-scale causal disambiguation of physically degenerate mechanisms—distinguishing, for instance, whether stellar mass drives gas depletion or vice versa—and explicitly uncovers direct causal pathways among key variables (e.g., stellar mass, gas content, morphology) alongside latent confounding effects. By systematically embedding causal inference into astrophysical data analysis, our work significantly advances the interpretability and mechanistic fidelity of galaxy evolution modeling.
📝 Abstract
Data-driven astrophysics currently relies on the detection and characterisation of correlations between objects'properties, which are then used to test physical theories that make predictions for them. This process fails to utilise information in the data that forms a crucial part of the theories'predictions, namely which variables are directly correlated (as opposed to accidentally correlated through others), the directions of these determinations, and the presence or absence of confounders that correlate variables in the dataset but are themselves absent from it. We propose to recover this information through causal discovery, a well-developed methodology for inferring the causal structure of datasets that is however almost entirely unknown to astrophysics. We develop a causal discovery algorithm suitable for astrophysical datasets and illustrate it on $sim$5$ imes10^5$ low-redshift galaxies from the Nasa Sloan Atlas, demonstrating its ability to distinguish physical mechanisms that are degenerate on the basis of correlations alone.