🤖 AI Summary
This work addresses the problem of learning causal abstraction mappings between high-level and low-level structural causal models (SCMs) under realistic constraints—including unstructured SCMs, purely observational data, and sample misalignment. Methodologically, we propose a category-theoretic modeling framework grounded in semantic embedding principles, formally characterizing embedding constraints as geometric conditions on the Stiefel manifold, and developing a Riemannian optimization-driven general formulation for causal abstraction learning. We innovatively design three Riemannian learning algorithms that jointly minimize the KL divergence between Gaussian measures, thereby overcoming the non-convexity inherent in nonlinear causal abstraction learning. Empirically, our approach robustly recovers causal abstraction structures on both synthetic benchmarks and real-world human brain neuroimaging data, demonstrating high stability even under severe scarcity of prior knowledge.
📝 Abstract
Structural causal models (SCMs) allow us to investigate complex systems at multiple levels of resolution. The causal abstraction (CA) framework formalizes the mapping between high- and low-level SCMs. We address CA learning in a challenging and realistic setting, where SCMs are inaccessible, interventional data is unavailable, and sample data is misaligned. A key principle of our framework is $ extit{semantic embedding}$, formalized as the high-level distribution lying on a subspace of the low-level one. This principle naturally links linear CA to the geometry of the $ extit{Stiefel manifold}$. We present a category-theoretic approach to SCMs that enables the learning of a CA by finding a morphism between the low- and high-level probability measures, adhering to the semantic embedding principle. Consequently, we formulate a general CA learning problem. As an application, we solve the latter problem for linear CA; considering Gaussian measures and the Kullback-Leibler divergence as an objective. Given the nonconvexity of the learning task, we develop three algorithms building upon existing paradigms for Riemannian optimization. We demonstrate that the proposed methods succeed on both synthetic and real-world brain data with different degrees of prior information about the structure of CA.