🤖 AI Summary
Standard diffusion models lack the capacity to model causal structures, rendering them unsuitable for interventional sampling and causal inference. This work proposes a causal graph–guided conditional diffusion mechanism that embeds a known directed acyclic graph into the diffusion process for the first time. By appropriately propagating interventional signals during reverse-time sampling and employing resampling to construct null distributions for edge testing, the method enables valid causal discovery. Theoretical analysis guarantees convergence of distribution estimation and control of Type I error in edge testing. Experiments demonstrate that the approach more accurately recovers interventional distributions in simulations, achieves nominal significance levels with high statistical power in edge tests, and successfully validates contested signaling pathways in flow cytometry data.
📝 Abstract
Standard diffusion models are flexible estimators of complex distributions, but they do not encode causal structures and therefore do not by themselves support causal analysis. We propose a causality-encoded diffusion framework that incorporates a known directed acyclic graph by training conditional diffusion models consistent with the graph factorisation. The resulting sampler approximately recovers the observational distribution and enables interventional sampling by fixing intervened variables while propagating effects through the graph during reverse diffusion. Building on this interventional simulator, we develop a resampling-based test for directed edges that generates null replicates under a candidate graph. We establish convergence guarantees for observational and interventional distribution estimation, with rates governed by the maximum local dimension rather than the ambient dimension, and prove asymptotic control of type I error for the edge test. Simulations show improved interventional distribution recovery relative to baselines, with near-nominal size and favourable power in inference. An application to flow cytometry data demonstrates practical utility of the proposed method in assessing disputed signalling linkages.